- Problem Overview
- What Data is Used?
- Regression Analysis
- Spatial Analysis
November 20th, 2017
Does the proximity of opening a new station effect the ridership of other stations near the newly opened station?
How does the proximity of NYC subway stations effect ridership?
Observations: 9,884,307 Variables: 11 $ tripduration <dbl> 997, 1904, 305, 250, 464, 1118, 394, 1449, 42... $ starttime <dttm> 2013-07-01 06:00:16, 2013-07-01 06:00:30, 20... $ stoptime <dttm> 2013-07-01 06:16:53, 2013-07-01 06:32:14, 20... $ startstationid <chr> "436", "294", "385", "271", "477", "488", "30... $ startstationname <chr> "Hancock St & Bedford Ave", "Washington Squar... $ endstationid <chr> "467", "375", "440", "390", "522", "497", "32... $ endstationname <chr> "Dean St & 4 Ave", "Mercer St & Bleecker St",... $ bikeid <dbl> 16199, 20281, 18143, 16370, 15497, 15502, 161... $ usertype <chr> "Subscriber", "Subscriber", "Subscriber", "Su... $ birthyear <dbl> 1979, 1949, 1988, 1962, 1975, 1957, 1963, 195... $ gender <chr> "2", "1", "1", "1", "1", "1", "2", "1", "1", ...
Time: 6 - 10 am weekday mornings
Date Range:
[1] "2013-07-01 UTC" "2017-07-31 UTC"
Stations:
[1] 749
Only the stations labeled in red will be used in the analysis.
These stations have the largest amount of longitudinal data and are more homogenous in the surrounding area.
Start analysis in 2014, giving 6 months "burn in"
Observations: 1,673 Variables: 6 $ day_trip <dttm> 2013-01-01, 2013-01-02, 2013-01-03, 2013-01-04, 2013... $ PRCP <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,... $ SNOW <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... $ SNWD <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... $ TMIN <int> 26, 22, 24, 30, 32, 34, 37, 35, 39, 40, 37, 42, 43, 3... $ TMAX <int> 40, 33, 32, 37, 42, 46, 45, 48, 49, 47, 46, 47, 50, 5...
Does proximity to subway stations show decrease in Citibike usage?